Syntactic Preprocessing for Statistical Machine Translation

نویسنده

  • Nizar Habash
چکیده

We describe an approach to automatic source-language syntactic preprocessing in the context of Arabic-English phrase-based machine translation. Source-language labeled dependencies, that are word aligned with target language words in a parallel corpus, are used to automatically extract syntactic reordering rules in the same spirit of Xia and McCord (2004) and Zhang et al. (2007). The extracted rules are used to reorder the source-language side of the training and test data. Our results show that when using monotonic decoding and translations for unigram source-language phrases only, source-language reordering gives very significant gains over no reordering (25% relative increase in BLEU score). With decoder distortion turned on and with access to all phrase translations, the differences in BLEU scores are diminished. However, an analysis of sentence-level BLEU scores shows reordering outperforms no-reordering in over 40% of the sentences. These results suggest that the approach holds big promise but much more work on Arabic parsing may be needed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation

Syntactic Reordering of the source language to better match the phrase structure of the target language has been shown to improve the performance of phrase-based Statistical Machine Translation. This paper applies syntactic reordering to English-to-Arabic translation. It introduces reordering rules, and motivates them linguistically. It also studies the effect of combining reordering with Arabi...

متن کامل

Improving Phrase-Based SMT with Morpho-Syntactic Analysis and Transformation

This paper presents our study of exploiting morpho-syntactic information for phrase-based statistical machine translation (SMT). For morphological transformation, we use hand-crafted transformational rules. For syntactic transformation, we propose a transformational model based on Bayes’ formula. The model is trained using a bilingual corpus and a broad coverage parser of the source language. T...

متن کامل

Rule-based Syntactic Preprocessing for Syntax-based Machine Translation

Several preprocessing techniques using syntactic information and linguistically motivated rules have been proposed to improve the quality of phrase-based machine translation (PBMT) output. On the other hand, there has been little work on similar techniques in the context of other translation formalisms such as syntax-based SMT. In this paper, we examine whether the sort of rule-based syntactic ...

متن کامل

Coupling Hierarchical Word Reordering and Decoding in Phrase-Based Statistical Machine Translation

In this paper, we start with the existing idea of taking reordering rules automatically derived from syntactic representations, and applying them in a preprocessing step before translation to make the source sentence structurally more like the target; and we propose a new approach to hierarchically extracting these rules. We evaluate this, combined with a lattice-based decoding, and show improv...

متن کامل

The POSTECH Statistical Machine Translation Systems for NTCIR-7 Patent Translation Task

This paper describes the POSTECH statistical machine translation (SMT) systems for the NTCIR-7 patent translation task. We entered two patent translation subtasks: Japanese-to-English (KLE-je), and English-toJapanese translation (KLE-ej). The baseline systems are derived from a common phrase-based SMT framework. In addition, for Japanese-to-English translation, we adopted two kinds of methods. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007